Overview

Dataset Statistics

Number of Variables 8
Number of Rows 1.0001e+07
Missing Cells 52158
Missing Cells (%) 0.1%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 1.1 GB
Average Row Size in Memory 119.2 B
Variable Types
  • Numerical: 7
  • Categorical: 1

Dataset Insights

SK_DPD and SK_DPD_DEF have similar distributions Similar Distribution
CNT_INSTALMENT is skewed Skewed
CNT_INSTALMENT_FUTURE is skewed Skewed
SK_DPD is skewed Skewed
SK_DPD_DEF is skewed Skewed
MONTHS_BALANCE has 10001358 (100.0%) negatives Negatives
CNT_INSTALMENT_FUTURE has 1185960 (11.86%) zeros Zeros
SK_DPD has 9706131 (97.05%) zeros Zeros
SK_DPD_DEF has 9887389 (98.86%) zeros Zeros

Variables


SK_ID_PREV

numerical

Approximate Distinct Count 936325
Approximate Unique (%) 9.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 160021728
Mean 1.9032e+06
Minimum 1000001
Maximum 2843499
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • SK_ID_PREV is skewed right (γ1 = 0.0442)

Quantile Statistics

Minimum 1000001
5-th Percentile 1.0845e+06
Q1 1.4355e+06
Median 1.899e+06
Q3 2.3698e+06
95-th Percentile 2.7514e+06
Maximum 2843499
Range 1843498
IQR 934350.5

Descriptive Statistics

Mean 1.9032e+06
Standard Deviation 535846.5307
Variance 2.8713e+11
Sum 1.9035e+13
Skewness 0.04423
Kurtosis -1.2162
Coefficient of Variation 0.2815

SK_ID_CURR

numerical

Approximate Distinct Count 337252
Approximate Unique (%) 3.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 160021728
Mean 278403.8633
Minimum 100001
Maximum 456255
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • SK_ID_CURR is skewed left (γ1 = -0.0031)

Quantile Statistics

Minimum 100001
5-th Percentile 118179
Q1 189809
Median 279032
Q3 367713
95-th Percentile 438676.95
Maximum 456255
Range 356254
IQR 177904

Descriptive Statistics

Mean 278403.8633
Standard Deviation 102763.7451
Variance 1.056e+10
Sum 2.7844e+12
Skewness -0.003128
Kurtosis -1.1968
Coefficient of Variation 0.3691
  • SK_ID_CURR is not normally distributed (p-value 1.971606694258156e-06)

MONTHS_BALANCE

numerical

Approximate Distinct Count 96
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 160021728
Mean -35.0126
Minimum -96
Maximum -1
Zeros 0
Zeros (%) 0.0%
Negatives 10001358
Negatives (%) 100.0%
  • MONTHS_BALANCE is skewed left (γ1 = -0.6728)

Quantile Statistics

Minimum -96
5-th Percentile -85
Q1 -53
Median -27
Q3 -13
95-th Percentile -4
Maximum -1
Range 95
IQR 40

Descriptive Statistics

Mean -35.0126
Standard Deviation 26.0666
Variance 679.4661
Sum -3.5017e+08
Skewness -0.6728
Kurtosis -0.7107
Coefficient of Variation -0.7445

CNT_INSTALMENT

numerical

Approximate Distinct Count 73
Approximate Unique (%) 0.0%
Missing 26071
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 159604592
Mean 17.0897
Minimum 1
Maximum 92
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • CNT_INSTALMENT is skewed right (γ1 = 1.6017)

Quantile Statistics

Minimum 1
5-th Percentile 6
Q1 10
Median 12
Q3 24
95-th Percentile 48
Maximum 92
Range 91
IQR 14

Descriptive Statistics

Mean 17.0897
Standard Deviation 11.9951
Variance 143.8814
Sum 1.7047e+08
Skewness 1.6017
Kurtosis 2.4469
Coefficient of Variation 0.7019
  • CNT_INSTALMENT is not normally distributed (p-value 2.0986407779691978e-13)
  • CNT_INSTALMENT has 498724 outliers

CNT_INSTALMENT_FUTURE

numerical

Approximate Distinct Count 79
Approximate Unique (%) 0.0%
Missing 26087
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 159604336
Mean 10.4838
Minimum 0
Maximum 85
Zeros 1185960
Zeros (%) 11.9%
Negatives 0
Negatives (%) 0.0%
  • CNT_INSTALMENT_FUTURE is skewed right (γ1 = 1.8467)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 3
Median 7
Q3 14
95-th Percentile 35
Maximum 85
Range 85
IQR 11

Descriptive Statistics

Mean 10.4838
Standard Deviation 11.1091
Variance 123.4112
Sum 1.0458e+08
Skewness 1.8467
Kurtosis 3.7133
Coefficient of Variation 1.0596
  • CNT_INSTALMENT_FUTURE is not normally distributed (p-value 1.0165562290765968e-09)
  • CNT_INSTALMENT_FUTURE has 694783 outliers

NAME_CONTRACT_STATUS

categorical

Approximate Distinct Count 9
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 712427928
  • The largest value (Active) is over 12.29 times larger than the second largest value (Completed)

Length

Mean 6.2331
Standard Deviation 0.8631
Median 6
Minimum 3
Maximum 21

Sample

1st row Active
2nd row Active
3rd row Active
4th row Active
5th row Active

Letter

Count 62322639
Lowercase Letter 52321277
Space Separator 17019
Uppercase Letter 10001362
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Active, Completed) take over 50.0%
  • The largest value (active) is over 12.29 times larger than the second largest value (completed)

SK_DPD

numerical

Approximate Distinct Count 3400
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 160021728
Mean 11.6069
Minimum 0
Maximum 4231
Zeros 9706131
Zeros (%) 97.0%
Negatives 0
Negatives (%) 0.0%
  • SK_DPD is skewed right (γ1 = 14.8991)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 4231
Range 4231
IQR 0

Descriptive Statistics

Mean 11.6069
Standard Deviation 132.714
Variance 17613.0173
Sum 1.1609e+08
Skewness 14.8991
Kurtosis 255.3221
Coefficient of Variation 11.434
  • SK_DPD is not normally distributed (p-value 4.22744628287715e-25)
  • SK_DPD has 295227 outliers

SK_DPD_DEF

numerical

Approximate Distinct Count 2307
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 160021728
Mean 0.6545
Minimum 0
Maximum 3595
Zeros 9887389
Zeros (%) 98.9%
Negatives 0
Negatives (%) 0.0%
  • SK_DPD_DEF is skewed right (γ1 = 66.3399)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 3595
Range 3595
IQR 0

Descriptive Statistics

Mean 0.6545
Standard Deviation 32.7625
Variance 1073.3808
Sum 6.5456e+06
Skewness 66.3399
Kurtosis 4836.5469
Coefficient of Variation 50.0597
  • SK_DPD_DEF is not normally distributed (p-value 4.226514625510636e-25)
  • SK_DPD_DEF has 113969 outliers

Interactions

Correlations

Missing Values